I constructed transcriptomes for each tissue in the following manner: use Stringtie on each sample, merge gtfs using gencode annotation as a referenct for each sample by subtissue, using a TPM cutoff of of at least 1 TPM in each tissue in each tissue. I then evaluated this using our salmon expression metrics. I found that as expected transcript contruction is largely similar between subtissues of the same tissue type
I define novel transcripts as a novel variation of a known gene. Novel transcripts contain 2 major classes: Modification of a previous exons to create novel, unannotated exons, or a rearrangment of known exons in a novel arrangment.
Novel loci are entirely novel transcribed regions of the genome, which can also be broken down into protein vs non protein coding.
Novel transcripts are novel due to one or more novel/unannotated exon. A single novel exon can be associated with multiple transcripts, and can either fall in the Protein coding or untranslated region of a transcript, or be part of non coding trnascript.
I then wanted to identify what biological process might be driving the formation of these novel exons 3 major sources of transcriptional variation are alternative splicing - retained intron(RI) or alternative splice sites(A3SS/A5SS), Alternative promoters(novel_TSS), and alternative polyadenylation.(novel_TES). Novel exons are exons that are some combination of the above process, or completely novel regions of transcribed sequence.
Next, using the contructed transcriptomes for each tisssue, I analyzed splicing patterns for both reference and novel exons
Heatmap
While the expression of some exons is ubiquitous across tissues, I can see clear regions in the heatmap that denote tissue specifc splicing.
Looking specifically at the novel exons in the transcriptomes that I had determiend were due to splicing, I wanted to find out much they are actually being used, so I defined used as haivng a PSI>=.1 One big Thing I saw was that some of the exons that I had annotated as originating from alternative splicing were not considered as alternatively spliced by rMATs(uses GTF to globally define all alternative splicing events). Cornea fetal is not inclused because there are no paired end samples which I need for rMATs
Next, I looked at rMATs events that were not being detected/ were not minimally used. Each point is a tissue. Retained intron events are the most commonly undetected/lowly used event
Next I want to identify some functional differneces between the transcripts contructed in each transcript. I selected all transcripts expressed in the fetal eye and adult eye(Retina, Cornea, RPE), and then identified genes that had a novel trasncript in either fetal or adult, and found enriched gene onotology terms
Next, I used the quantification data from eyeintegration to analyze novel loci in the devloping eye. I broke the fetal retina time series into early, mid and late and looked for differentially expressed genes